R Markdown Assingment

Question 3:

– The outputs for R notebook and R markdown are rather similiar. Writing an R Notebook document is no different than writing an R Markdown document. The text and code chunk syntax does not differ at all. The main difference is that when executing chunks in an R Markdown document, all the code is sent to the console at once, but in an R Notebook, only one line at a time is sent. This allows execution to stop if a line raises an error. Also R markdown uses knit to run all the R code chunks and create a document, while the notebook uses preview, which shows you only a rendered HTML copy of the contents of the editor. Also, unlike Knit, Preview does not run any R code chunks.

Question 4:

– The only difference in input is the section under “output” at the head of the file that changes depending on which type of document you are knitting to. As for output, each document is conveying the same information, but because they are put into completly different formats they differ quite a bit. Word looks the best to me, but I think I have some bias due to using primarly word for making nice papers.

Question 5:

I went to Kaggle and found some data to work on!

My data is on the top songs from spotify over the last 10 years!

Here is a link to where I found the data: https://www.kaggle.com/leonardopena/top-spotify-songs-from-20102019-by-year/metadata

Question 6

– Time for some data analysis!

setwd("~/Rmarkdown1_Carson_Green")
library(readr) 
top10s <- read_csv("top10s.csv")
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   title = col_character(),
##   artist = col_character(),
##   `top genre` = col_character(),
##   year = col_double(),
##   bpm = col_double(),
##   nrgy = col_double(),
##   dnce = col_double(),
##   dB = col_double(),
##   live = col_double(),
##   val = col_double(),
##   dur = col_double(),
##   acous = col_double(),
##   spch = col_double(),
##   pop = col_double()
## )
songs <- top10s
head(songs)
## # A tibble: 6 x 15
##      X1 title artist `top genre`  year   bpm  nrgy  dnce    dB  live   val
##   <dbl> <chr> <chr>  <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     1 Hey,~ Train  neo mellow   2010    97    89    67    -4     8    80
## 2     2 Love~ Eminem detroit hi~  2010    87    93    75    -5    52    64
## 3     3 TiK ~ Kesha  dance pop    2010   120    84    76    -3    29    71
## 4     4 Bad ~ Lady ~ dance pop    2010   119    92    70    -4     8    71
## 5     5 Just~ Bruno~ pop          2010   109    84    64    -5     9    43
## 6     6 Baby  Justi~ canadian p~  2010    65    86    73    -5    11    54
## # ... with 4 more variables: dur <dbl>, acous <dbl>, spch <dbl>, pop <dbl>
str(songs)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 603 obs. of  15 variables:
##  $ X1       : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ title    : chr  "Hey, Soul Sister" "Love The Way You Lie" "TiK ToK" "Bad Romance" ...
##  $ artist   : chr  "Train" "Eminem" "Kesha" "Lady Gaga" ...
##  $ top genre: chr  "neo mellow" "detroit hip hop" "dance pop" "dance pop" ...
##  $ year     : num  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ bpm      : num  97 87 120 119 109 65 120 148 93 126 ...
##  $ nrgy     : num  89 93 84 92 84 86 78 76 37 72 ...
##  $ dnce     : num  67 75 76 70 64 73 75 52 48 79 ...
##  $ dB       : num  -4 -5 -3 -4 -5 -5 -4 -6 -8 -4 ...
##  $ live     : num  8 52 29 8 9 11 4 12 12 7 ...
##  $ val      : num  80 64 71 71 43 54 82 38 14 61 ...
##  $ dur      : num  217 263 200 295 221 214 203 225 216 235 ...
##  $ acous    : num  19 24 10 0 2 4 0 7 74 13 ...
##  $ spch     : num  4 23 14 4 4 14 9 4 3 4 ...
##  $ pop      : num  83 82 80 79 78 77 77 77 76 73 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   X1 = col_double(),
##   ..   title = col_character(),
##   ..   artist = col_character(),
##   ..   `top genre` = col_character(),
##   ..   year = col_double(),
##   ..   bpm = col_double(),
##   ..   nrgy = col_double(),
##   ..   dnce = col_double(),
##   ..   dB = col_double(),
##   ..   live = col_double(),
##   ..   val = col_double(),
##   ..   dur = col_double(),
##   ..   acous = col_double(),
##   ..   spch = col_double(),
##   ..   pop = col_double()
##   .. )

– About my Data:

603 observations of 15 variables:

title - Song’s title

artist - Song’s artist

top genre - the genre of the track

year - Song’s year in the Billboard

bpm - Beats.Per.Minute - The tempo of the song.

nrgy - Energy - The energy of a song - the higher the value, the more energtic.

dnce - Danceability - The higher the value, the easier it is to dance to this song.

dB - Loudness..dB.. - The higher the value, the louder the song

live - Liveness - The higher the value, the more likely the song is a live recording

val - Valence - The higher the value, the more positive mood for the song.

dur - Length - The duration of the song.

acous - Acousticness.. - The higher the value the more acoustic the song is.

spch - Speechiness - The higher the value the more spoken word the song contains.

pop - Popularity- The higher the value the more popular the song is.

Question 7

– Time to plot stuff I’m intrested in!

– First I want to see what’s up with the populatiry variable!

hist(songs$pop, xlab = "Popularity", ylab = "Song Count", main = "Histogram of Relative Popularity", border="black", 
     col="green") 

Looks like most of the songs in this dataset have a high value for popularity, which would make sense, since this is a dataset containing spotify’s most popular songs over the past 10 years.

But it also looks like there are some songs with low popularity scores. I want to look for those songs with low popularity scores. I’ll look for songs with popularity score lower than 40!

library(dplyr)
songslesspop <- filter(songs, pop < 40)

# I want to do a quick count to see how many songs are under 40 in popularity
count(songslesspop)
## # A tibble: 1 x 1
##       n
##   <int>
## 1    32
hist(songslesspop$pop, xlab = "Popularity", ylab = "Song Count", main = "Histogram of Less Popular songs", border="green", 
     col="blue")

The plot and the count shows us that there are not many songs with a popularity less than 40! Only 32! The histogram I made also shows us that we have 5 songs with a popularity rating of 0.

For fun I’m gonna look at just those songs real quick!

zeropop <- filter(songs, pop == 0)

zeropop
## # A tibble: 5 x 15
##      X1 title artist `top genre`  year   bpm  nrgy  dnce    dB  live   val
##   <dbl> <chr> <chr>  <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1    51 Hello Marti~ big room     2010   128    98    67    -3    10    45
## 2   139 Blow~ P!nk   dance pop    2012   114    92    60    -3    25    75
## 3   268 Not ~ Justi~ dance pop    2014    86    73    59    -6    38    46
## 4   363 L.A.~ Fergie dance pop    2015   202    39    48    -8    26    27
## 5   443 Mill~ Adele  british so~  2016     0     0     0   -60     0     0
## # ... with 4 more variables: dur <dbl>, acous <dbl>, spch <dbl>, pop <dbl>

Looks like Hello by Martin Solveig,

Blow Me (One Last Kiss) by P!nk,

Not a Bad Thing by Justin Timberlake,

L.A.LOVE (la la) by Fergie2, and

Million Years Ago by Adele

Are the songs that made it onto spotify’s top songs of the 2010’s with a populatiry rating of Zero!

More Plotting for fun

– I want to look at which years had the highest energy levels and if it correlates with duriation of the songs!

library(ggplot2)

ggplot(songs, aes(x=year, y=nrgy, fill=dur)) + 
  labs(title="Plot of Energy by Year, colored by duration", x="Year", y="Energy Level" ) +
 geom_bar(stat = "identity") + theme_light()

From the looks of it, there isn’t much of a correlation between energy and duration. One cool thing to see though is that in 2015 some songs have energy levels over 6000! These songs are shorter though, as seen by the dark color they have.

Okay time to do something other than a histogram

– This time I’m gonna look at the top genres of the data set

# Let me look at what the top genre is 
tail(names(sort(table(songs$`top genre`))), 1)
## [1] "dance pop"
 library(dplyr)


library(tidyverse)

# Songs top genre count

songsc <- songs %>% group_by(`top genre`) %>% add_tally()

songsctg <- filter(songsc, n > 30) 

Lets look

options(tibble.width = Inf)

head(songsctg)
## # A tibble: 6 x 16
## # Groups:   top genre [3]
##      X1 title                artist        `top genre`   year   bpm  nrgy
##   <dbl> <chr>                <chr>         <chr>        <dbl> <dbl> <dbl>
## 1     3 TiK ToK              Kesha         dance pop     2010   120    84
## 2     4 Bad Romance          Lady Gaga     dance pop     2010   119    92
## 3     5 Just the Way You Are Bruno Mars    pop           2010   109    84
## 4     6 Baby                 Justin Bieber canadian pop  2010    65    86
## 5     7 Dynamite             Taio Cruz     dance pop     2010   120    78
## 6     8 Secrets              OneRepublic   dance pop     2010   148    76
##    dnce    dB  live   val   dur acous  spch   pop     n
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1    76    -3    29    71   200    10    14    80   327
## 2    70    -4     8    71   295     0     4    79   327
## 3    64    -5     9    43   221     2     4    78    60
## 4    73    -5    11    54   214     4    14    77    34
## 5    75    -4     4    82   203     0     9    77   327
## 6    52    -6    12    38   225     7     4    77   327

From looking at my filtered data set with counts it seems that the top three song genres are as follows:

  1. Dance Pop - 327 counts
  2. Pop - 60 counts
  3. Canadian Pop - 34 counts

In conclusion:

Justin Bieber was killing the game during the last decade.